Add KEDA ScaledObject support to WorkerResourceTemplate by gibbonjj · Pull Request #285 · temporalio/temporal-worker-controller

gibbonjj · 2026-04-21T19:28:31Z

Summary

Adds KEDA ScaledObject as a supported kind under WorkerResourceTemplate, with a new token-substitution mechanism and a kind-aware scale-to-zero guard. Additive; HPA paths are unchanged.

Motivation

Kubernetes permits only one external.metrics.k8s.io APIService per cluster. In clusters running KEDA, that slot is occupied by keda-operator-metrics-apiserver, which prevents HorizontalPodAutoscaler with type: External metrics from resolving against any other provider. Users in this (common) configuration can't use the existing HPA-based WorkerResourceTemplate pattern for backlog-driven autoscaling — they need to produce KEDA-native ScaledObjects that KEDA watches directly.

The existing spec.metrics[*].external.metric.selector.matchLabels label injection doesn't apply to KEDA Prometheus triggers — those carry a freeform PromQL query string rather than a structured label selector. This PR introduces token substitution as a complementary, general-purpose mechanism.

What's in the PR

Token substitution engine (internal/k8s/tokens.go). Recursively substitutes three tokens in every string leaf of the rendered template: __TEMPORAL_WORKER_DEPLOYMENT_NAME__, __TEMPORAL_WORKER_BUILD_ID__, __TEMPORAL_NAMESPACE__. Values mirror the existing matchLabels injection 1:1. Unknown __FOO__-style tokens pass through unchanged. Runs before autoInjectFields in RenderWorkerResourceTemplate, so structured injection downstream never sees unresolved tokens.
Kind-aware scale-to-zero guard (api/v1alpha1/workerresourcetemplate_webhook.go). The existing unconditional minReplicas: 0 rejection is replaced with a kind switch: HPA guards minReplicas, ScaledObject guards both minReplicaCount and idleReplicaCount. Shared scaleToZeroRationale constant keeps both messages in sync. Same Temporal-side reason as before: approximate_backlog_count is not emitted when the task queue is idle with no pollers, so a metric-based autoscaler cannot detect new work after scaling to zero. The existing \"minReplicas must not be 0\" substring is preserved (two existing tests assert on it).
No change to scaleTargetRef injection. KEDA's ScaledObject.spec.scaleTargetRef accepts a superset of what the controller already injects (apiVersion: apps/v1, kind: Deployment, name). The existing recursive injection at internal/k8s/workerresourcetemplates.go works unchanged for KEDA; a render test proves it.
Envtest integration. Vendored stripped KEDA CRD at api/v1alpha1/testdata/keda/scaledobject-crd.yaml (uses x-kubernetes-preserve-unknown-fields: true — just enough for RESTMapper resolution; no upstream version coupling). webhook_suite_test.go points at it and adds ScaledObject to ALLOWED_KINDS. A happy-path admission test in workerresourcetemplate_webhook_integration_test.go exercises the full kube-apiserver → admission webhook → SAR path.
User-facing docs and example. examples/wrt-keda-prometheus.yaml parallels the existing HPA example. docs/worker-resource-templates.md gains a "Token substitution" section, a KEDA example section, and an updated "Allowed resource kinds and RBAC" entry. helm/temporal-worker-controller/values.yaml gets a commented-out ScaledObject stanza — default chart behavior is unchanged (users who don't run KEDA are unaffected).

Design choices worth flagging

Tokens vs. query rewriting. A plausible alternative was to recognize triggers[*] where type: prometheus and rewrite metadata.query to append per-version label filters. This would require parsing PromQL, which is brittle and couples the controller to KEDA's schema. Token substitution is schema-agnostic: it works for any current or future CRD whose metric configuration uses freeform strings.
ScaledJob out of scope. ScaledJob is a different architectural pattern (per-task Jobs, no scaleTargetRef to a Deployment). Revisit if users ask.
Scale-to-zero rejected symmetrically for both knobs. KEDA's idleReplicaCount: 0 is the primary scale-to-zero mechanism; minReplicaCount: 0 is the secondary. Both are blocked for consistency with HPA policy and because the underlying Temporal-side metric behavior applies equally. If Temporal later emits a metric that survives queue idleness, the guard can relax.

Usage

Users opting in add the ScaledObject entry to workerResourceTemplate.allowedResources:

workerResourceTemplate:
  allowedResources:
    - kinds: ["HorizontalPodAutoscaler"]
      apiGroups: ["autoscaling"]
      resources: ["horizontalpodautoscalers"]
    - kinds: ["ScaledObject"]
      apiGroups: ["keda.sh"]
      resources: ["scaledobjects"]

Then author a single WorkerResourceTemplate with tokens in the PromQL query; the controller renders one ScaledObject per active Build ID at reconcile time. See examples/wrt-keda-prometheus.yaml and the updated docs/worker-resource-templates.md for full walkthroughs.

Test plan

go test ./... — all packages pass, including the new unit tests (token engine, render-level substitution, ScaledObject render determinism) and envtest integration test (happy-path admission of a ScaledObject-backed WRT).
go vet ./... — clean.
Existing HPA webhook tests still pass; the \"minReplicas must not be 0\" error substring is preserved.
Helm chart templates render (manual check of the rbac.yaml template with the new ScaledObject entry uncommented).
End-to-end verification against a real KEDA installation — out of scope for this PR; the envtest path proves admission and rendering, not live scaling behavior.

Commit structure

13 commits in strict TDD alternation (failing test → implementation → next failing test → …). Happy to squash on merge if preferred.

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three fixes from final branch review: - RenderWorkerResourceTemplate processing-order godoc now lists token substitution as step 2 and renumbers downstream steps. - validateWorkerResourceTemplateSpec step comments close the 3 → 5 gap (the scale-to-zero switch is now labeled step 4). - scaledObjectForIntegration comment no longer implies the webhook does token substitution (substitution is at controller render time). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CLAassistant · 2026-04-21T19:28:38Z

All committers have signed the CLA.

Adds examples/wrt-keda-temporal.yaml showing the `type: temporal` trigger (KEDA >= 2.17) as a complement to the existing Prometheus-trigger example. The native scaler queries Temporal's gRPC API directly — no metrics pipeline dependency, native per-buildId scoping via the `buildId` metadata field rather than a metric label. Updates docs/worker-resource-templates.md to: - Frame the two trigger types with a clear "prefer native when X, prefer Prometheus when Y" rubric. - Restructure the KEDA example into Prometheus and native-temporal subsections, each with a full WRT spec. - Clarify that token substitution covers any string field, making both trigger types work with the same substitution mechanism. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

carlydf · 2026-04-23T02:14:00Z

Hi @gibbonjj, thank you for building this! We plan to support KEDA ScaledObject as a WorkerResourceTemplate as soon as possible.

The currently-released temporal KEDA trigger does not correctly query per-version task queue backlog for Worker Deployment versions, so this PR won't work well for scaling until that changes. The good news is, we have a PR to KEDA adding that support which is almost merged, and it is on the required PRs list for the v2.20 release.

Once KEDA v2.20 is released with support for Worker Deployments, we will support it in the controller. You can track the issue here: #286

gibbonjj and others added 13 commits April 21, 2026 12:26

test: add failing tests for token substitution engine

49dedcc

feat: add token substitution engine for WorkerResourceTemplate

7e403fd

test: render must substitute temporal tokens in template

41d1beb

feat: substitute temporal tokens during RenderWorkerResourceTemplate

ac1b073

test: render ScaledObject templates with scaleTargetRef + tokens

69edbda

test: webhook must reject ScaledObject scale-to-zero configurations

24214f1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: extract scale-to-zero rationale and extend guard to ScaledObject

d8603ae

test: vendor stripped KEDA ScaledObject CRD for envtest

86adaa1

test: add KEDA ScaledObject CRD and kind to envtest setup

f67292a

test: envtest happy-path for ScaledObject admission

1e396c6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add example ScaledObject WorkerResourceTemplate

79f1f5b

docs: document KEDA ScaledObject support and token substitution

74b1fa1

gibbonjj requested review from a team and jlegrone as code owners April 21, 2026 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KEDA ScaledObject support to WorkerResourceTemplate#285

Add KEDA ScaledObject support to WorkerResourceTemplate#285
gibbonjj wants to merge 14 commits intotemporalio:mainfrom
gibbonjj:keda-scaledobject-support

gibbonjj commented Apr 21, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 21, 2026 •

edited

Loading

Uh oh!

carlydf commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gibbonjj commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

What's in the PR

Design choices worth flagging

Usage

Test plan

Commit structure

Uh oh!

CLAassistant commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlydf commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gibbonjj commented Apr 21, 2026 •

edited

Loading

CLAassistant commented Apr 21, 2026 •

edited

Loading